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Preface 

The work described in this report was performed by the Information 
Systems Civision of the Jet Propulsion Laboratory. 
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Abstract 

A model of robot learning Is described that associates previously 
unknown perceptions with the sensed known consequences of robot actions. For 
these actions, both the categories of outcomes and the corresponding sensory 
patterns are Incorporated In a knowledge base by the system designer. Thus 
the robot Is able to predict the outcome of an action and compare the expecta- 
tion with the experience. New knowledge about what to expect In the world 
may then be Incorporated by the robot In a pre-existing structure whether It 
detects accordance or discrepancy between a predicted consequence and experi- 
ence. Errors committed during plan execution are detected by the same type 
of comparison process and learning may be applied to avoiding the errors. 

The model Is being Implemented as a system called RECOOilZER, and will be 
Incorporated Into the existing JPL robot system so that Its performance may 
be tested In real situations. 


Descriptive Tenrs: lobot learning, error correction, partial matching, 

association, recognizable states. 
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INTRODUCTION 


We describe a learning paradigm designed Co Improve Che performance of a 
roboc In a par dally unpredlcCable envlronmenc. We will discuss closely relaced 
work on error correcClon, showing how Che learning process may be applied Co Ic, 
and how research In parClal macchlng of paccerns can be used Co provide Che 
necessary supporc for Che learning process once Ic has been Inlclaced. The 
Insplraclon for Che work on learning and error correcClon reporCed here has 
been Che JPI. Roboclcs Research Program. A roboC syscem has been under developmenc 
aC JPL for five years and Is now fully operaclonal, InCegraClng vision and scene 
analysis subsyscems wlch both manlpulaClon and locomoClon (see Fig. 1). A brief 
overview of Che roboc's sysCem organlzaClon Is given In Thompson's paper on 
roboc navlgaclon (Thompson, Tl) . 

The learning paradigm, which Is being implamenced In a sysCem called 
RECOGNIZER, has been described elsewhere wlch some indicaClons of iCs applica- 
Cion Co modeling biological learning (Friedman, FI). Here we are concerned 
wlch its interactions with error correction and partial matching. The starting 
point for both the learning and error correction processes is the recognizable 
state. By recognizable we mean two things. First, an Internally stored model 
of the state exists. Second, a process for matching sensory Inputs against 


the Internal model also exists. 
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REC0GHI2ER 

RECOGNIZER will associate object descriptions or perceptions from the 
scene analysis system with perceived consequences of robot actions. In order 
to make clear what is meant by learning in this paper • we will first describe 
elements of the Common Sense Algorithm (CSA) language in which RECOGNIZER is 
being programmed. CSA is a high-level language system under development by 
C. J. Rieger for use in natural language understanding (Rieger , Rl* R2, R3). 

The JPL robot employs, in addition to various support procedures, a subset 
of procedures that accomplish useful functions in scene analysis, manipulation 
and locomotion. A member of this procedure subset is called im "action." A 
string of actions, called a "plan," can achieve specific goals of a hintan 
operator. In order that a plan-synthesizer may be able to construct a plan in 
RECOGNIZER, knowledge about robot actions is provided by the designer and 
stored as a CSA form. This form is a triple, linking the name of a robot action 
and its parameter list with the name of the state it produces via a causal 
link. (Rl) . The form also includes slots for preconditions or gates. These are 
states that must be true if the action is to produce its intended effect. For 
useful plans to be constructed, the unlnstantlated algorithms must be selected 
and instantiated. A decision net for each goal-state performs this function. 

The selection net for a goal-state is called a "causal" net and consists of 
nodes, arcs, and the terminal algorithms. A test performed at each node 
chooses the arc to a successor node and to another test, eventually reaching a 
terminal algorithm (see Fig. 2) . When the CSA plan-synthesizer receives a 
request to make a goal state true, it traverses the corresponding causal net to 
a given algorithm, examining that algorithm for gating state conditions not 
already true. For each of the gating states in turn, the process of traversing 
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^lg« 2. A CSA decision net 
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Its causal net Is rapaatsd till an action to make each gating condition true is 
found. The synthesiser then links the ectlons in proper order to make a plan 
that accomplishes the desired goal state, niere is also a generalised "demcm" 
capability provided. 

This brief description of the language suffices for our definitions and 
we can now describe RECOQIIZER Itself. RECOQilZER Incorporates a causal net 
for each action in the robot’s repertoire. Other decision nets are also employed 
in the system. For each robot action, there is an "outcome" net. This is a 
decision net that terminates with measurable predictions of what may happen 
as the result of an action. The predictions take the form of uore-or-less 
directly sensed input parameters such as "finger touch sensors 1 and 2 are off" 
or of higher-level perceptions Inferred from these patterns such as "unsupported 
rock." Still another type of net is the semantic decision net which selects for 
perception categories based on the descriptions constructed from the sensory 
input. One semantic net infers useful property categories of objects perceived, 
another the existence of conditions leading to the commission of errors. 

Each semantic net furnishes a corresponding outcome net with the infor- 
mation needed to make a decision about what is the expected outcon^, selecting 
from all known outcomes of an action. If such a partnership exists for every 
robot action, expectation can be compared with experience. (Specific examples 
of outcome nets and semantic nets are given later.) 

The specification of the categories of objects that the robot needs to 
know and the kinds of error states that it can readily detect requires close 
study of actual robot experience by the semantic and outemoe net designer and 
an intimate knowledge of robot subsystem design. With this knowledge, he can 
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specify particular outcone states the robot can naasura without knowing the 
nature of the object or environsMntal state which will produce that outcone 
state in advance* 

Ona additional CSA feature facilitates learning. This is the ability to 
perform "reverse search" of decision nets. The net is normally traversed fron 
top to bottom, with an Initial test leading to an arbitrary number of further 
tests, terminating in some kind of executable statement representation 
(terminal algorithm) or datum. In CSA, it is possible, after making such a 
traverse, to start at the termination and retrace the path actually followed in 
reverse, because the result of each test has been rem«ri>ered. By arranging a 
system which plans action strings from a knowledge base of causal nets and 
which has acme expectation of what it will sense as the result of each action it 
will t^e, we can relate what is perceived during execution with the anticipated 
sensed cons~quences. Reverse search enables us to locate critical branches at 
which to place the learned perceptions. 

WHAT CHANGES AS A RESULT OF LEARNING 

IVo forms of learning will be discussed, learning how to categorise 
specific unknown perceptions and acquisition of stimulus-response pairs. In 
categorising perceptions the tests resident at a given node of a semantic net 
are subject to modification. The net structure (number of nodes, the arcs 
leading to successor nodes and the terminations) remains unchanged. At the 
start, before learning, most of the nodes will have only default tests; i.e., 
they will have no templates to match against a pattern perceived externally. 

When no templates are present that match, the default test points to an arc 
that Is most likely when the robot's world is behaving normally. A succession 
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of dofault choice* lead* to a terminal perception category that is most likely. 
Wo make the asaumption that the environment is regular enough to justify pre- 
selection of a normal or default node. 

After perception learning takes place, there will be templates at those 
nodes where there were initially only defaults. VRien an incoming perception 
matches such a teiq>lace, a non-default arc is chosen, leading to a non-normal 
perception category termination. 

During stimulus-response acquisition, the structure of the semantic net is 
modified to add new terminations as well as new teoplates at the nodes. 

When and how the templates are generated by EECOQilZER and how they are 
positioned cn the appropriate node are described next. 


A PERCEPTION LEARNING SCEHARIO 

A semantic net before learning is shown in Fig. 3. The net provides 
for e matching of visual perception patterns snd can potentially select for 
intrinsic object properties that affect manipulator performance. The net 
shown selects for the properties "heavy," "fragile," "sticky" and "hard." 
"Hard" is the default termination, and will always be selected as the expected 
category at the outset. Thus the robot will respond by trying to grasp all 
objects it is coBBanded to manipulate. 

To initiate a learning experience, the robot may be commaitded to "pick 
up rock 1 and put it in the box." The plan-synthesiser will then generate 
a string of actions including "analyze scene," "find rock 1," "grasp rock 1" 
"move-<Aject rock 1" and "t^rasp rock 1." Figure 4 shows an outcome net 
associated with the action "grasp." An execution monitor look* at each action 
In the plM stack before it is executed. It then activates the corresponding 
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Fig. 3. Robot semantic net before learning 
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otttcoae net. In addition to the outcome net, a trigger-tree, TTl, Is activated 
by the monitor. Trigger-trees are CSA constructs and consist of packers of 
demons (Rieger R3) . A demon In TTl will be on the alert for each combination 
of sensory inputs shown in the terminations of the outccMne net. Thus each 
termination Is a recognizable state. Note that for "grasp” alone the sensors 
available cannot distinguish between "hard," "sticky" and "heavy." In effect, 
several categories can be inferred from the action "grasp". These can be dis- 
ambiguated by subsequent actions. 

Before "grasp" Is executed, as part of the process of scene analysis and 
segmentation to find rock 1, the semantic net will make a selection to cate- 
gorize the object. When the robot is "naive" (before any experience) the 
semantic net choice will inevitably be "hard object." When the next action is 
"gra&’>," its outcome net uses the semantic net selection to make the choice of 
"hard expected." As "grasp" Is executed, the activated demons report to a 
trigger monitor which compares the demon actually triggered with the expected 
outcome perceptions. For simplicity, assume that the early experience cf the 
robot will be only with a variety of hard, non-fragile objects. 

After each exercise of "grasp," the trigger monitor asks the scene analy- 
sis system for its description of the object grasped. The scene analysis 
system, DABI, designed by Yaklmovsky and Cunningham, is working now in the 
robot system and operates with a library of primitive attributes, specified in 
advance (Yaklmovsky and Cunningham, YCl) . An attribute list that is imple- 
mentable might include shape, size, texture, color pattern, and symmetry. 

The trigger mnltor receives advice from the outcome net in Fig. 4 to wait at 
least till "ungrasp" for the next step in learning. For both "move-object" and 
"ungrasp," the outcomes (Figs. 5 and 6) confirm "hard movable object." Therefore 
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the trigger monitor can proceed. By reverse search » starting at the confirmed 
termination "hard obJect»" a monitor function climbs the semantic net^ placing 
the description found by scene analysis at each node containing a default till 
It gets to the highest node In the net. If the experience Is repeated, a process 
capable of determining the common attributes and relations and eliminating 
differing attributes is employed to revise the test for attributes present in 
hard objects. Hayes-Roth has described programs for similarity and difference 
matching (partial matching) between patterns that will do this job (Hayes-Roth, 

HRl; Hayes-Roth and McDermott, HR-McDl) . For the property lists we are Calking 
about here, simple bit operations suffice, performed on binary vectors representing 
presence or absence of attributes. For more complex relations, his algorithms 
search the problem space efficiently, and will be employed in RECOGNIZER. 

Now the stage Is set for learning about exceptional properties such as 
"fragile." Suppose the robot Is commanded to pick up a Christioas tree ornament. 

It grasps with normal pressure and breaks It. At this point, the trigger 
monitor discovers that the demon corresponding to a fragile broken object has 
been triggered and that this Is not Che expected outcome. Once again It requests 
the object description from scene analysis, but now starts its reverse search 
of Che semantic net (Fig. 3) from the termination "fragile object." It is 
looking for the last node common to the path that scene analysis took in the 
normal direction (to hard object) and the path to the actually experienced 
termination (fragile object). The monitor can find this node because each 
time the scene analysis system traverses the semantic net. It leaves an updated 
marker at each node of the path taken. All the trigger monitor has to do Is 
climb from the termination "fragile object" to N3 In Fig. 2 to find the current 
path marker. This node Is where It will locate its test for "fragile." The 
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trigger monitor then calls the partial matching process to examine the tests for 
hardness (non-fragility) at N3. The partial matching process will now seek to 
find the diffevemea between hard and fragile objects by comparing the N3 tests 
already present with the scene analysis characterization of the ornament as a 
spherical, smooth, shiny, red object. If a difference set cannot be found, 
the partial matcher may request more detailed attributes of scene analysl:. 

This is possible because DABI operates with a resource allocation algorithm 
that controls the time spent and depth of tree search. Thus in a first pass 
the object might be characterized as spherical. More in-depth analysis would 
add "a small cylinder sticks out of the sphere." If, on the basis of some 
predetermined criterion, a distinctively different attribute set description of 
a fragile object can be found after partial matching a limited number of times, 
it will be placed at N3, overriding possible similar descriptions for hard 
objects placed there earlier. Thus the next time the robot is commanded to 
pick up a similar ornament, its semantic net will choose "fragile object," and 
its outcome net, uy selecting "fragile" expectancies, will find advice for the 
execution monitor to "grasp with minimum pressure," advice that was not found 
during its first experience with an ornament until too late. 

Similar outcome nets are shown in Figs. 5 and 6 for the actions "move- 
object" and "ungrasp." These subsequent actions, as already pointed out, serve 
to disambiguate the properties "too-heavy" and "sticky" from "hard." Once 
tests for such objects are learned, the predicted expectancies contain advice 
to inhibit the execution monitor from proceeding further with a planned grasp. 
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Fig* 5* Robot outcome net for "move-object 
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UNDERLYING PRINCIPLES 

Hiere are several underlying principles of the learning paradigm. First* 
recognizable state outcomes that are independent of what ie to be learned are 
associated with an action (or string of actions). Second* to be useful, the 
recognizable states must relate to goals of the system such as avoiding danger* 
correcting errors, locating energy, etc. Third* the recognlzHble state must be 
coupled with an action that increases the likelihood that what is to be learned 
is properly segmented or Isolated from the total sensory input. (The designer 
can only anticipate perception-outcome relationships that are likely* not 
guaranteed). Thus "grasp" relates to an object whose intrinsic properties (such 
as weight) may be recognizable via actions (such as move-object) and specific 
sensory stimuli (such as manipulator motor current overload) divorced from the 
object's appearance, but the appearance of the object grasped may then become 
useful information* allowing the machine to avoid further overloads. 


ERROR CORRECTION 

We turn now to error correction* a subject closely connected to learning* 
and give an overview of the approach adopted by S. Srlnlvas (and to be included 
in RECOGNIZER) for correcting execution errors in robot performance (Srlnlvas* SI). 
His starting point is also the recognizable state. For each action of the JPL 
robot he stores a list of possible error states and triggering perceptions 
actually available in the existing system. For example* the action "move-hand- 
to-grasp" can be associated with six foreseeable error states. The hand could 
miss the object to be grasped entirely, left or right fingers could bump into 
it, etc. Ambiguities similar to those of the recognizable states described 
in the section on learning also exist here, due to the Imperfect knowledge 
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supplied by the available sensory devices. If In "nove-hand-to-grasp" the hand 
missed entirely » the JPL robot would only know actual hand position (from Its 
angle-sensing pots) relative to desired hand position. It would have to execute 
the next action, "grasp," to resolve the ambiguity between correct placement 
and a complete miss. 

Srlnlvas applies two basic strategies for correcting errors after having 
detected them. These are failure reason analyeia and multiple outcome analyeie. 

In "failure reason analysis" he seeks to determine automatically why the failure 
occurred by examining the history of actions preceding the failure. When the 
reason for failure Is known, a corrective action can usually be associated with 
It. The second strategy Ignores ithy and seeks to characterize the nature of the 
error state — t^at exactly Is the error? Sometimes, simply knowing what Is 
wrong may point to a correction. It appears to be Impossible to know In advance 
which of these strategies (If either) will find a proper course of action to 
correct the error. 

Failure reason analysis Is accomplished by synthesizing a tree of causally 
linked failure reasons and actions. A knowledge base of possible failures for 
each robot action Is provided. Iliese are classified into operational, pre- 
condition, information, and constraint errors. Starting with the action at which 
failure was detected. Its associated list of possible failures becomes a candidate 
for the tree. Some classes of error are causally linked to previous actions. 

For example, an "incorrect Information supplied" reason has the link "Incorrectly 
provided by" which points to a previous action. Before adding a candidate 
failure reason to the tree. It Is pruned, if possible, by a variety of tech- 
niques. One method Is to examine the sensory manifestations experienced during 
the performance? of the specific action. A manifestation selection net based 
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on study of that action will point to some of the failure reasons of the can- 
didate list as being more probable than others. Usually the manifestation 
net will rule out some reasons. The failure tree synthesizer is linked to 
a trace of previous actions, (bice a layer of failure reasons is accepted, 
those failures causally linked to previous actions provide the actions for 
Che next round of synthesis and pruning. The number of layers added to the 
tree is limited by Che finite trace maintained. If the tree can be pruned 
enough to narrow the reasons for error to a single cause, a proper course of 
action for correcting that error is usually determinable in advance and stored 
with the error. 

Multiple outcome analysis seeks to characterize what the error state is 
by performing additional "inexpensive” tests, when necessary; l.e., causing the 
robot to execute additional actions for the sole purpose of adding information 
about the nature of the error committed. This may be needed if the triggering 
recognizable state indicates an error ambiguously. 

If either failure reason analysis or multiple outcome analysis has found 
a solution pointing to a course of action, the planner goes to work patching 
in error corrections to the action plan. To achieve the necessary preconditions 
for the failed action, it may have to undo some actions as well as redo thers. 
Thus, if the failure was associated with "grasp" and the fingers were closed 
before a failure was detected, they would have to be opened again before 
retrying "grasp." The resultant "undo" and "redo" steps and new actions are 
patched into the previous plan and execution is resumed. 
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LEABNINC APPLIED TO ERROR CORHECTKMI 

Error correction refers to the ultimate achievement of a goal state after 
an initial execution failure. Learning refers to avoiding the failure in sub- 
sequent attempts to achieve similar goals. The confined qualities may be called 
adaptation. R£CO(^IZER will incorporate the techniques worked out by Srinlvas. 
Not only are they important in their own right, but they extend the scope of 
adaptation possible. The learning techniques already described may be applied 
to the recognizable states classified as errors. Our example will once again 
center on the action "grasp." Figure 7 shows a second outcome net introduced 
for the action "grasp" with an additional class of terminal recognizable state3. 
The class previously discussed (Fig. 4) ("hard," "fragile," "sticky," "heavy") 
do not initiate error correction. The second class ("missed," "position error 
normal to finger plane [p.e.n.f .p.] ," "left finger touching," etc.) when 
recognized. Initiate both learning and error correction processes running in 
parallel (or simulating parallel processing) . Figure 8 Introduces a new 
semantic net, the object-grasp error net. This net is shown before learning 
and contains a set of tests containing only defaults pointing to the terminal 
category, "no error." With such a net we can discuss a learning scenario. 
Suppose that the robot attempts to grasp a rock from above, fails to maintain a 
briefly attained grasp, and multiple outcome analysis discovers that a "position 
error normal to the finger plane" exists ( Fig. 9) . Such a "squeezing out" 
error occurs frequently. If the robot can categorize shapes such as "wedge- 
shaped" or "hemispherical," the partial matching process (HR-1) may discover 
that such shapes positioned between the fingers are often associated with 
failure. RECOGNIZER will then plant templates for these shapes and for "manip- 
ulator position with respect to the object" at the node pointing to p.e.n.f.p. 
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fig. 9. Position error normal to finger plane (P.E.N.F.P. 


77-16 


At chli stage of learning, RECOGNIZER will expect to conait a p.e.n.f.p. error 
when It encounters such shapes and will find advice in the "object-grasp" error 
outcooe net on how to modify the robot's actions to correct the error. 


LEARNING STIWJLUS-RESPONSE PAIRS 

Figure 7 indicates where correction strategies are suggested a priori 
for correcting this type of error. For the p.e.n.f.p. error, in order of 
increasing motor complexity the correction strategies are: 

a) rotate plane of sliding vector 

b) change angle of approach of hand from above to side of object 

c) provide support underneath object (use a shovel) . 

For each shape that produces an expectancy of "grasp" failure, the plan- 
synthesizer can patch in the simplest technique. If it succeeds, an association 
is established between the successful action and the shape template at node N2 
of the object-grasp error net (Fig. 8) . The association may be conveniently 
represented by modifying the structure of the semantic error net, creating a new 
category "p.e.n.f.p. (technique a)" termination, pointed to by the template 
associated with it.* 

If the skimpiest technique does not succeed, the error correction process 
will attemp., in order, the more complex b and c strategies. If they are 
successful, corresponding new terminations are created in the error net structure. 
In effect, a stimulus- response structure is created combining an arbitrary 
perception with a pre-fabricated response. TIius the system may progress fmn 
making motor errors of a particular kind to acquiring a kind of motor skill to 
avoid such failures. 

*CSA provides for dynamic restructuring of decision nets. 

22 




4 





RECOGNIZER STATUS 


A small knowledge base has been provided for exercising the plan- 
synthesizert and some demon structures have been implemented^ demonstrating 
the planner’s capacity to produce correct robot action strings and for CSA 
functions to perform reverse search of decision nets. Implementing the elements 
of RECOGNIZER, and demonstrations of learning and error correction with the robot 
are scheduled for this year. 

SUMMARY AND FUTURE DIRECTIONS 

Some features of the RECOGNIZER learning and error correction system 
design have been described. Although a few general algorithmic principles can 
be pointed out for the processes denoted here by the terms ’•learning** and 
‘’error correction,’* the system can be made to work only by studying the actual 
contexts in which the JPL robot will find itself and incorporating the necessary 
detailed empirical knowledge in its data base. With such a system made opera- 
tional, the robot may be able to cop^' with a less constrained environment than 
a laboratory. 

A recognizable state is one which matches ai. environmental perception to 
a stored model. Learning is initiated by such a match. The stored model attri- 
butes of recognizable states are Independent of wh??t is to be learned. The 
effect of learning is to enhance the system’s ability to predict a relationship 
between a previously unknown perception of the environment and a semantic category 
defined by a given recognizable state. When such recognizible states are 
errors, the system may learn to modify its motor behavior to avoid committing 
them. The system’s ability to accomplish learning depends or finding a descrip- 
tion of the perception distinctively different from its previously experienced 
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By studying the actions associated with error stares, reasons for failures 
of the action can be stored In advance, and the knowledge can be used to synthesize 
failure analyses tailored to the failures actually encountered. We are consid- 
ering extensions of the error detection and correction capabilities to encompass 
a wider scope of error* For example. If the robot can link detected conditions 
like “unsupported” and “above-the-ground” to a model of gravity, it can make 
better predictions of where to look If it drops a rock. If It Is engaged In a 
complex mechanical assembly and has a model of the completed correct assembly 
In memory, together with a knowledge of the stage attained. It can detect errors 
more context dependent than those tied to the general purpose actions of the 
robot behavior repertoire. 

During the coming year we plan to integrate RECOGNIZER Into the JPL robot’s 
software system and test Its performance In real situations. Such a system 
when dealing with previously unknown objects will enable the robot to eventually 
predict intrinsic properties of the objects related to Its own goals and so 
achieve those goals more consistently. 
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